A Robust Free Size OCR for Omni-Font Persian/Arabic Printed Document Using Combined MLP/SVM
نویسندگان
چکیده
Optical character recognition of cursive scripts present a number of challenging problems in both segmentation and recognition processes and this attracts many researches in the field of machine learning. This paper presents a novel approach based on a combination of MLP and SVM to design a trainable OCR for Persian/Arabic cursive documents. The implementation results on a comprehensive database show a high degree of accuracy which meets the requirements of commercial use.
منابع مشابه
Optical Font Recognition from Projection Profiles
• Recognition of logical document structures [1], where knowledge of the font used in a word, line, or text block may be useful for defining its logical label (chapter title, section title or paragraph). • Document reproduction, where knowledge of the font is necessary in order to reproduce (reprint) the document. • Document indexing and information retrieval, where word indexes are generally p...
متن کاملFONT DISCRIMINATIO USING FRACTAL DIMENSIONS
One of the related problems of OCR systems is discrimination of fonts in machine printed document images. This task improves performance of general OCR systems. Proposed methods in this paper are based on various fractal dimensions for font discrimination. First, some predefined fractal dimensions were combined with directional methods to enhance font differentiation. Then, a novel fractal dime...
متن کاملA Modfied Self-organizing Map Neural Network to Recognize Multi-font Printed Persian Numerals (RESEARCH NOTE)
This paper proposes a new method to distinguish the printed digits, regardless of font and size, using neural networks.Unlike our proposed method, existing neural network based techniques are only able to recognize the trained fonts. These methods need a large database containing digits in various fonts. New fonts are often introduced to the public, which may not be truly recognized by the Opti...
متن کاملA font and size-independent OCR system for printed Kannada documents using support vector machines
This paper describes an OCR system for printed text documents in Kannada, a South Indian language. The input to the system would be the scanned image of a page of text and the output is a machine editable file compatible with most typesetting software. The system first extracts words from the document image and then segments the words into sub-character level pieces. The segmentation algorithm ...
متن کاملSupport Vector Machine for Persian Font Recognition
In this paper we examine the use of global texture analysis based approaches for the purpose of Persian font recognition in machine-printed document images. Most existing methods for font recognition make use of local typographical features and connected component analysis. However derivation of such features is not an easy task. Gabor filters are appropriate tools for texture analysis and are ...
متن کامل